FullTextSearch v4.1.0 – Protected Pages Return Log...
# help-with-umbraco
o
Hello everyone, I'm currently working with Umbraco 13.6.0 and uSkinned SiteBuilder 6.1.2, using the FullTextSearch package v4.1.0 for searching content within my site. The package has been working perfectly, providing accurate summaries with highlighted search terms—until I started protecting my blog pages with "Restrict Public Access". Issue: Since enabling member authentication on blog posts, the search summaries now return the content of the login page instead of the actual blog text. This makes sense since the pages are protected, but I need a way for the search indexer to access the actual content while keeping the protection in place. Setup: Using FullTextSearch with the following appsettings.json configuration:
Copy code
"FullTextSearch": {
  "DefaultTitleField": "nodeName",
  "FullTextPathField": "searchablePath",
  "FullTextContentField": "bodyText",
  "HighlightPattern": "<span class=\"highlight\">{0}</span>",
  "Enabled": true,
  "RenderingActiveKey": "FullTextRenderingActive",
  "XPathsToRemove": [
    "//header",
    "//head",
    "//nav",
    "//footer",
    "//aside",
    "//script"
  ]
}
Using FullTextSearch’s SearchService to retrieve pages:
Copy code
public PagedSearchResult SearchPages(IPublishedContent? rootNode, string searchTerm, int pageNumber, int pageSize)
{
    if (string.IsNullOrWhiteSpace(searchTerm) || rootNode == null)
    {
        return new PagedSearchResult();
    }

    string culture = Thread.CurrentThread.CurrentUICulture.Name.ToLowerInvariant();

    var search = new Our.Umbraco.FullTextSearch.Models.Search(searchTerm)
        .EnableHighlighting()
        .AddTitleProperty("nodeName")
        .AddSummaryProperty("bodyText")
        .SetSummaryLength(300)
        .SetPageLength(pageSize)
        .SetCulture(culture)
        .AddRootNodeId(rootNode.Id);

    var ftsResult = _ftsSearchService.Search(search, pageNumber);

    var paged = new PagedSearchResult
    {
        TotalItemCount = ftsResult.TotalResults,
        CurrentPage = (int)ftsResult.CurrentPage,
        PageSize = pageSize,
        TotalPages = (int)ftsResult.TotalPages,
        Results = ftsResult.Results.Select(x => new SearchResultItem
        {
            Content = x.Content,
            Title = x.Title,
            Summary = x.Summary?.ToHtmlString() ?? string.Empty,
            Score = x.Score
        }).ToList()
    };

    return paged;
}
The search worked perfectly before enabling “Restrict Public Access” on blog pages. After protecting the pages, summaries are incorrect, displaying only the HTML content of the login page instead of the expected blog content. The FullTextSearch package likely fetches page content as an unauthenticated user, meaning it only sees the login page instead of the real content behind authentication. Since the indexing process isn’t authenticated, the indexer stores the login page’s text inside bodyText, causing incorrect search results. Disabling protection → Works perfectly, so the issue is definitely related to authentication. Is there any way to allow FullTextSearch to index protected content properly while keeping pages restricted for non-members? Can I authenticate the fetcher somehow? Has anyone encountered this issue and found a solution? This is critical because my site is about to go live, and I need to keep the authentication in place while ensuring that search results display correct summaries instead of login page content. I appreciate any insights or suggestions! Thank you in advance for your help. 😊
s
It does fetch pages as unauthenticated (it basically just loads the page over http, and scrapes the content). You can see the fetching logic here: https://github.com/skttl/umbraco-fulltextsearch8/blob/v4/dev/src/Our.Umbraco.FullTextSearch/Rendering/HttpPageRenderer.cs Don't know how you would add authentication though, and I guess you would need to specify a member id, or member group or how else you are authenticating.
Also, if you the search function is available to unauthenticated users, beware that "authenticating" the indexing process, will make indexed authenticated content used for summaries available for unauthenticated users too.
o
Yeah, I figured that it just fetches the page over HTTP, which is why it's indexing the login page instead of the actual content for protected pages. What I don’t quite understand is why this isn’t somehow accounted for in the package, considering "Restrict Public Access" is a standard Umbraco feature. I was expecting there to be some way to handle authentication, or at least a note in the documentation about this limitation. I spent quite some time trying to make it work, assuming it was possible, and now I need to figure out an alternative approach or an extension, which I’m honestly not sure how to implement. Would you have any suggestions on how to tackle this within the package’s structure? Or do you see a way it could be extended to support authenticated content indexing?
s
Becuase authentication is not just black or white. User A might have access to something that User B doesn't have access to. And AFAIK Examine isn't really suited at this either. If you requirements is simple enough, I would probably make my own implementation of the HttpPageRenderer and replace the original one with that. But as noted before, that would make content requiring authentication available in the index, and thus visible to unauthenticated users too.
o
My website is an intranet for a company, and the only public pages are the login page, the imprint page, the privacy policy page, and a "no access" page. Everything else is behind authentication, meaning search will only be available for authenticated users. So, I don’t have any concerns about exposing protected content in the index. That said, I’m struggling with how to correctly modify the HttpPageRenderer to handle authentication properly. I use standard Umbraco Members for authentication, and I’ve tried a few different approaches—like attempting to pass authentication cookies—but I keep running into issues. Most of the time, when I attempt to authenticate the request, the bodyText comes back empty instead of containing the expected content. I’m not sure what I’m doing wrong, and I’d really appreciate any guidance on how to properly authenticate the HTTP request within the HttpPageRenderer. Is there something specific I need to do to get the request to be recognized as an authenticated member?
And I also only have one member group, which means that every user who logs into the intranet is created as a member with access to the “IntranetMembers” group and all members are simply in there, and there won't be any other groups.
s
I don't really know how to handle authentication in the HttpClient. But your requirements sounds simple enough. Are you using route hijacking (own rendercontrollers, etc)? If not, you could try out the RazorPageRenderer instead. I have a suspicion, that it doesn't care about access control on the pages 🙂
The bodyText is always empty then...
k
we worked around this in 2 projects by replacing the IPublicAccessChecker of Umbraco, for preview and indexing purposes. Of course you need to make sure you page logic does not crash on missing member or you need to mock that part too.
Copy code
public class PublicAccessChecker : IPublicAccessChecker
{
    public async Task<PublicAccessStatus> HasMemberAccessToContentAsync(int publishedContentId)
    {
        HttpContext httpContext = _httpContextAccessor.GetRequiredHttpContext();
        FullTextSearchHelper fullTextSearchHelper = httpContext.RequestServices.GetRequiredService<FullTextSearchHelper>();
        if (_umbracoContextAccessor.GetRequiredUmbracoContext().InPreviewMode ||
            fullTextSearchHelper.IsRenderingActive())
        {
            return PublicAccessStatus.AccessAccepted;
        }
...
rest of default implementation
...
}
and builder.Services.AddUnique(); in composer
s
Ooh thats nice!
11 Views